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Abstract —Caching and multicasting at base stations are two 
promising approaches to snpport massive content delivery over 
wireless networks. However, existing schednling designs do not 
make fnll nse of the advantages of the two approaches. In this 
paper, we consider the optimal dynamic multicast scheduling 
to jointly minimize the average delay, power, and fetching 
costs for cache-enabled content-centric wireless networks. We 
formulate this stochastic optimization problem as an infinite 
horizon average cost Markov decision process (MDP). It is well- 
known to be a difficult problem due to the curse of dimensionality, 
and there generally only exist numerical solutions. By nsing 
relative value iteration algorithm and the special structures of 
the request queue dynamics, we analyze the properties of the 
valne function and the state-action cost fnnction of the MDP 
for both the uniform and nonuniform channel cases. Based 
on these properties, we show that the optimal policy, which is 
adaptive to the reqnest queue state, has a switch structure in the 
uniform case and a partial switch strnctnre in the nonuniform 
case. Moreover, in the uniform case with two contents, we show 
that the switch curve is monotonically non-decreasing. Then, by 
exploiting these strnctnral properties of the optimal policy, we 
propose two low-complexity optimal algorithms. Motivated by 
the switch structures of the optimal policy, to further reduce the 
complexity, we also propose a low-complexity suboptlmal policy, 
which possesses similar structural properties to the optimal 
policy, and develop a low-complexity algorithm to compute this 
policy. 

Index Terms —Cache, content-centric, mnlticast, dynamic pro¬ 
gramming, strnctnral resnlts, queneing. 

I. Introduction 

The demand for wireless communication services has been 
shifting from connection-centric communications such as, 
traditional voice telephony and messaging to content-centric 
communications such as video streaming, social networking, 
and content sharing. Moreover, the wireless data traffic is 
expected to grow at a compound annual growth rate of 57 
percent from 2014 to 2019, reaching 24.3 exabytes per month 
by 2019 HI. These phenomena propel the development of 
content-centric wireless networks El. 

Recently, to support the dramatic growth of the wireless 
data traffic, caching at base stations (BSs) has been proposed 
as a promising approach for massive content delivery and 
extensively studied in the literature Ei-ii- Specifically, in 
0, the authors introduce the concept of Femtocaching and 
study content placement at the small BSs to minimize the 
average content access delay. In a, the authors consider 
joint request routing and caching in small-cell networks, and 
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propose approximate algorithms to maximize the requests 
served by small BSs. In ||5l, the authors study the joint 
optimization of cache control and playback buffer management 
for video streaming in multi-cell multi-user MIMO cellular 
networks. Reference 0 analyzes the performance (e.g., the 
outage probability and the average delivery rate) of cache- 
enabled small-cell networks for given caching strategies. How¬ 
ever, in most existing literature 0-0, point-to-point unicast 
transmission is considered, which can only help to reduce 
the backhaul burden without effectively relieving the “on air” 
congestion. The inherent broadcast nature of wireless medium 
is not fully exploited, which is the major distinction of wireless 
communications from wired communications. 

On the other hand, enabling multicast service at BSs is 
an efficient way to deliver contents to multiple requesters 
simultaneously by effectively utilizing the inherent broadcast 
nature of wireless medium 0. References 0 and 0 consider 
scheduling problems for multicasting inelastic flows (with 
strict deadlines) in wireless networks. In Col, the authors 
study the asymptotic capacity of delay-constrained multicast 
in large scale mobile ad hoc networks. 

In view of the benefits of caching and multicasting, the 
joint design of the two promising techniques is expected to 
achieve superior performance for massive content delivery 
in wireless networks CD-ini. In specific, the authors in 
HU study coded multicasting for inelastic services under 
a given coded caching scheme in a single-cell network. In 
C2, the authors consider multicasting for inelastic services 
in cache-enabled small-cell networks. An approximate caching 
algorithm with performance guarantee and a heuristic caching 
algorithm are proposed to reduce the service cost of a fixed 
multicast transmission strategy. In ifTSl . the authors consider 
multicasting for inelastic services in cache-enabled multi-cell 
networks. A joint throughput-optimal caching and scheduling 
algorithm is proposed to maximize the service rates of inelas¬ 
tic services. However, ini-ca assume that the users have 
uniform channel conditions, and hence all the users can be 
served simultaneously by a single multicast transmission. It 
remains unclear how to design multicast scheduling for given 
cache placement to make full use of the broadcast nature of the 
wireless medium when users have nonuniform channel con¬ 
ditions. Moreover, for delay-sensitive services without strict 
deadlines (i.e., elastic services), it is unknown how to design 
optimal multicast scheduling for given cache placement by 
exploiting the tradeoff between the delay cost and the service 
cost. 

For cache-enabled content-centric wireless networks, there 
are two important phases, i.e., content placement and content 
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delivery, and the two phases in general happen on different 
timescales Q. In the existing literature on the joint design of 
caching and multicasting uni-ini, the authors either focus 
on the optimization of one phase for a fixed strategy of the 
other phase im, 02 or consider that content placement and 
multicast transmission are in the same timescale ca. To 
the best of our knowledge, the optimal design for the two 
timescale cache placement and multicast scheduling problem 
is still unknown. Therefore, as a first and necessary step for 
the joint two timescale design, in this work, we focus on the 
optimal multicast scheduling for given cache placement. Based 
on the small timescale problem considered here, we would like 
to consider the joint two timescale design in future work. 

In this paper, we consider a cache-enabled content-centric 
wireless network with one BS, K users (with possibly different 
channel conditions) and M contents (with possibly different 
content sizes). The BS stores a certain number of contents 
in its cache and can fetch any uncached content from the 
core network through a backhaul link, with a fetching cost 
depending on the content size. In each slot, the BS schedules 
one content for multicasting to serve the users’ pending 
requests, with a power cost depending on both the content 
size and the channel conditions of the users being served. We 
consider the optimal dynamic multicast scheduling to jointly 
minimize the average delay, power, and fetching costs. We 
formulate the stochastic optimization problem as an infinite 
horizon average cost Markov decision process (MDP) m. 
There are several technical challenges. 

• Optimality analysis: The infinite horizon average cost 
MDP is well-known to be a difficult problem due to the 
curse of dimensionality m. While dynamic programming 
represents a systematic approach for MDPs, there generally 
exist only numerical solutions, which do not typically offer 
many design insights, and are usually not practical due to the 
curse of dimensionality. Therefore, it is desirable to analyze 
the structures of the optimal policies. Specifically, the consid¬ 
ered problem in this work can be treated as the problem of 
scheduling a broadcast server to parallel queues with general 
arrivals and switching costs. Several existing works have 
studied the related problems ITSll - lfTTl . In particular, fTSll and 
03 consider the problems of scheduling a broadcast server 
to a two-queue system with general arrivals and a multiple- 
queue system with symmetric arrivals, respectively. Reference 
ini studies the problem of scheduling a single server (without 
broadcast capability) to two queues with switching costs. Note 
that, the switching costs, which relate to the fetching costs 
in our problem, are not considered in HE), m, and the 
broadcast capability is not considered for the server in iflTll . 
To the best of our knowledge, the structural properties of the 
optimal scheduling of a broadcast server to parallel queues 
with general arrivals and switching costs remains unknown 
and is highly nontrivial. 

• Algorithm design: Standard brute-force algorithms such 
as value iteration and policy iteration M to MDPs are usually 
impractical for implementation due to the curse of dimension¬ 
ality, and cannot exploit the structural properties of the opti¬ 
mal policy. To reduce the complexity, several existing works 
propose structured optimal algorithms which incorporate the 


structural properties into the standard algorithms m, M- 
However, these structured optimal algorithms still suffer from 
the curse of dimensionality, which is embedded in the optimal 
control designs for MDPs and generally cannot be broken 
without any loss of optimality. On the other hand, the structural 
properties of the optimal policy may be one key reason for its 
good performance. Therefore, it is highly desirable to develop 
low-complexity suboptimal solutions, which can relieve the 
curse of dimensionality, while maintaining similar structural 
properties to optimal policies. However, for most existing 
approximate approaches Eol, m, there is (in general) no 
guarantee that the obtained suboptimal policies have similar 
structural properties to the optimal policies. To the best of 
our knowledge, the design of low complexity suboptimal 
solutions of similar structural properties to the optimal policies 
is unknown. 

In this paper, we consider the uniform and nonuniform chan¬ 
nel cases. By using relative value iteration algorithm (RVIA) 
02 and the special structures of the request queue dynamics, 
as well as the power and fetching costs, we analyze the proper¬ 
ties of the value function and the state-action cost function of 
the MDP for both the uniform and nonuniform cases. Based 
on these properties, for the uniform case, we show that the 
optimal policy has a switch structure. In particular, the request 
queue state space is divided into M regions corresponding to 
the M contents. The optimal policy schedules a content for 
multicasting when the request queue state falls in the region 
corresponding to the content. For the uniform case with two 
contents, we further show that the switch curve is monotoni- 
cally non-decreasing. Next, for the nonuniform case, we show 
that the optimal policy has a partial switch structure, which 
is similar to the switch structure in the uniform case. The 
difference reflects the channel asymmetry among the users. 
Then, we propose two low-complexity optimal algorithms by 
exploiting these structural properties of the optimal policy. 
Note that, although the switch structures may look intuitive, it 
is challenging to prove these structures rigorously. Motivated 
by the switch structures of the optimal policy, to further reduce 
the complexity, we also propose a low-complexity subopti¬ 
mal solution using approximate dynamic programming M- 
Different from suboptimal solutions obtained using existing 
approximate approaches, the proposed suboptimal solution 
possesses similar structural properties to the optimal policy. 
Then, we develop a low-complexity algorithm to compute 
the suboptimal policy. These analytical results hold for both 
i.i.d. request arrival and Markov-modulated request arrival 
models. Numerical results verify the theoretical analysis and 
demonstrate the performance of the proposed optimal and 
suboptimal solutions. The important notations used in this 
paper are summarized in Table I] 

H. Network Model 

As illustrated in Fig. [T] we consider a cache-enabled 
content-centric wireless network with one BS, K users and 
M contents. Let JC = - ,iT} denote the set of users. 

In our model, each user represents a group of users in the 
same location. Let Ad = denote the set of 
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K. 

set of users 

Ad 

set of all contents 

C 

set of contents cached in the BS 

k, m 

user, content index 

t 

slot index 

pirn, k) 

minimum transmission power required for 
delivering content m to user k within a slot 

fim) 

fetching cost of content m 

A = {Am,k) 

request queue matrix 

Q = [Qm) 

request queue state vector for uniform case 

Q = {Qm,k} 

request queue state matrix for nonuniform case 


sum request queue length 

y 

stationary multicast scheduling policy 

U 

multicast scheduling action 


value function 


state-action cost function 


TABLE I: List of important notations 


contents, where content m € A4 has the size of Im (in bits). 
Consider time slots of unit length (without loss of generality), 
and indexed by f = 1, 2, • • • Q In each slot, each user submits 
content requests to the BS according to a general distribution. 
The BS maintains request queues for different contents, which 
are implemented using counters. The BS is equipped with a 
cache storing a certain number of contents, depending on the 
cache size and the sizes of the cached contents. We assume 
the contents stored in the cache are given. Notice that, caching 
is in a much larger timescale and in this work, we consider 
multicast scheduling in a smaller timescale for a given caching 
design. Let C C Ai denote the set of cached contents. The 
BS can fetch any uncached content from the core network 
through a backhaul link, with a fetching cost depending on 
the content size. In each slot, the BS schedules one content 
for multicasting to serve the users’ pending requests, with a 
power cost depending on both the content size and the channel 
conditions of the users being served. In the following, we 
elaborate on the physical layer model, the service model, and 
the request model. 



Fig. 1: Cache-enabled content-centric wireless network. 


A. Physical Layer Model 

We assume that the duration of the scheduling slot is long 
enough to average the small-scale channel fading process, and 

'We consider an abstract model to capture the main features of cache- 
enabled content-centric networks. The contents could be short videos, sound¬ 
tracks, E-publications, etc, and the duration of a slot can be several seconds or 
minutes depending on the specific type of contents considered in this model. 


hence the ergodic capacity can be achieved using channel 
codingll Let hk denote the average channel gain between user 
k and the BS. Assume that only one content is delivered in 
each slot. Let p{m, k) denote the minimum transmission power 
required for delivering content m to user k within a scheduling 
slot. Assume p{m,k) satsifies p{m,k) = y{hk,lm.), where 
y{h, 1) is monotonically non-increasing with h for all I > 0. 
Without loss of generality, we assume that hi > h 2 >■■■ > 
Hk, which implies pirn, 1) < pirn, 2) < • • • < p{m, K) for 
all m. In this paper, we consider the uniform and nonuniform 
channel cases. In the uniform case, the channel gains of 
different users are the same, and hence, we have p{m, 1) = 
p{rn, 2) = • • • = p{m, K) = pirn) for each m. In the 
nonuniform case, the channel gains of different users can be 
different, and hence for each m, p{m, k) can be different for 
different users. 

B. Service Model 

We consider multicast service for content delivery. For 
clarity, we assume that in each slot, the BS schedules one 
content for multicasting to serve the users’ pending requests. 
The analytical framework and results can be extended to the 
general case in which the BS can transmit multiple contents in 
each slot. Let IC{m,t) € K, denote the set of users who have 
pending requests for content m at slot t. Let u{t) S Ad denote 
the content scheduled for multicasting at slot t. If content 
u{t) is cached (i.e., u{t) € C), the BS transmits it to all the 
users in lC{u{t),t) directly; otherwise, the BS first downloads 
u{t) from the core network through the backhaul link, then 
multicasts it to the users in lC{u{t),t) and finally discards it 
after the transmission. Note that, we consider fixed content 
placement and there is no extra cache storage to hold a new 
fetched content. 

Next, we illustrate the fetching and power costs. Let c{m) 
denote the cost for fetching content m via the backhaul link, 
depending on the content size. Then, the fetching cost is given 
by 

/(m) = l(m ^C)c(to), (1) 

where !(•) denotes the indicator function. Let k*{m,t) G 
K.{m,t) denote the user who requires the highest transmis¬ 
sion power among the users in K,{m,t), i.e., k*{m,t) = 
max/C(m,f). Then, to deliver content m to all the users in 
IC{m,t) within a slot, the power cost p{m,t) is given by 

p{m,t) = p{m,k*{m,t)) = max p{m,k). (2) 

k&K.(m,t) 

C. Request Model 

In each slot, each user submits content requests to the BS. 
Notice that each user (representing a group of users in the 
same location) can submit multiple requests for each content 
in each slot. Let Am,kit)G Am,k = {0,1, • • • , denote 

the number of the new request arrivals for content m from 
user k at the end of slot t, where m G Ad and k G 1C. 
Let A{t) = {Ajn,ki.t))m.(^M,k&icC •A^l\^ i.Ajn,k denote 
the request arrival matrix at slot t. We assume that Am,kit) is 

^Note that, this assumption is also used in (3 and Ha. 
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i.i.d. over slots and independent w.r.t. m according to a general 
distribution. For ease of illustration, we assume that the request 
arrival process is i.i.d. according to the Independent Reference 
Model (IRM), which is a standard approach adopted in the 
literature lO, ifTSll . The IRM is reasonable as each user in our 
model represents a group of users in the same location 12^ . In 
Section HXl we shall extend the analysis for the i.i.d. request 
arrival model to a Markov-modulated request arrival model. 
The BS maintains request queues for different contents. The 
request queues are implemented using counters and no data is 
stored in these request queues. In the following, we introduce 
two request queue models for the uniform and nonuniform 
cases, respectively. 

1) Uniform Case: In the uniform case, once content m 
is multicasted using transmission power p{m), all the users 
can receive content m. Therefore, we do not differentiate 
the requests for each content at the user level. Specifically, 
the BS maintains a separate request queue for each content 
m G M. Let Qm{t) G Qm = {0, 1,-- - ,Nm} denote the 
request queue length for content m at the beginning of slot t, 
where Nm is assumed to be finite (can be sufficiently large) for 
technical tractability. As illustrated in Section II-B, if content 
m is scheduled for transmission at slot t (i.e., u{t) = m), 
all the pending requests for content m are satisfied, i.e., the 
request queue for content m is emptied. Thus, the request 
queue dynamics for content m is as follows: 

Qm{t -b 1) = min{l(M(<) ^ m)Qm{t) + A^m}, (3) 

where Am{t) = ^rn,k(t) denotes the total number of 
the request arrivals for content m at the end of slot t. Let 
Q(<) = {Qm{t))mGM C Q denote the request queue state 
vector at the beginning of slot t in the uniform case, where 
Q = nm^M Qm denotes the request queue state space in the 
uniform case. 

2) Nonuniform Case: In the nonuniform case, different 

transmission powers are required to deliver a content to 
different users, as illustrated in Section II-B. Therefore, we 
differentiate the requests for each content at the user level. 
Specifically, the BS maintains a separate request queue for 
each content-user pair {m,k) G A4 x 1C. Let Qm,k{t) S 
Qm,k — {0, 1 , • • • , Nm,k} denote the request queue length for 
content-user pair (m, k) at the beginning of slot t, where Nm,k 
is assumed to be finite (can be sufficiently large) for technical 
tractability. Therefore, IC{m,t) can be expressed in terms of 
the request queue state, i.e., = {k\Qm,k{t) > 0}. 

The request queue dynamics for content-user pair (m, k) is as 
follows: 

Qm,k(f -f 1) = min{l(7r(f) ^ rn^Qm.kii^ “b ^m,k\- 

(4) 

Let Qmit) = {Qm,kit))k&jc G Qm denote the request queue 
state vector for content m at the beginning of slot t in the 
nonuniform case, where Qm = OfeeK: Qm,k denotes the 
request queue state space for content m in the nonuniform 
case. Let Q(f) = (Qm(i))mgA4 G Q denote the request queue 
state matrix at the beginning of slot t in the nonuniform case, 
where Q ^ YlmGM denotes the request queue state space 
in the nonuniform case. 


Note that, in (O and (IHi, once a content is scheduled, the 
corresponding request queue (in the uniform case) or request 
queues (in the nonuniform case) are emptied. This special form 
of queue departure reflects the multicast gain. Our framework 
holds for any number of users and any profile of request 
arrivals. 

III. Problem Formulation and Optimality 
Equation 

A. Problem Formulation 

Given an observed request queue state, the multicast 
scheduling action u is determined according to a stationary 
policy defined below. 

Definition 1 (Stationary Multicast Scheduling Policy): A 
stationary multicast scheduling policy ^ is a mapping from 
the request queue state Q G Q to the multicast scheduling 
action u G A4, where p(Q) = u. 

By the queue dynamics in Q or (|4|i, the induced random 
process {Q(f)} under policy ^ is a controlled Markov chain. 
We restrict our attention to stationary unichain policie^ For 
a given stationary unichain policy /r, the average delay cost is 
defined as 

1 ^ 

d(/r) = limsup-VE[d(Q(f))], (5) 

T-^oo 4 

where the expectation is taken w.r.t. the measure in¬ 
duced by the random request arrivals and the policy /i, 
d(Q(t)) = in the uniform case and d(Q(<)) = 

Sm k Qm,k{t) in the nonuniform case. By Little’s law, d{fi) 
reflects the average waiting time in the network under policy 
p. By dU and the average fetching and power costs are 
given by 

1 ^ 

/(p) =limsup-^E[/(u(f))], (6) 

1 ^ 

p{p) = limsup 7 ^ V E [p(Q(f), M(f))]. (7) 

Here, with abuse of notation, we also use p(Q(<), u{t)) to rep¬ 
resent p(u(f), f) given in (|2]i, as IC{u{t),t) = {k\Qu{t),k{t) > 
0}. Please note that, there is an inherent tradeoff between 
the delay cost and the service cost (including the power and 
fetching costs) in our model. As a simple example, we consider 
a multicast scheduling policy for the uniform case, where 
content m is scheduled only if Qm > Qth- As illustrated 
in Fig. 13 we can see that for content m, when Qt^ increases, 
the average service cost decreases while the average delay cost 
increases. This is because that when Qt^ increases, for each of 
the Qth requests, its waiting time increases while its service 
cost decreases. 

Therefore, to capture this tradeoff, we define the average 
system cost (weighted sum cost) under a given stationary 

^A unichain policy is a policy, under which the induced Markov chain has 
a single recuiTent class (and possibly some transient states) (m 
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Fig. 2; Tradeoff between the average delay cost and the average 
service cost for a certain content. 


unichain policy /i as 

g{p) = d{g) + WfJ{g) + Wppip) 

1 ^ 

= limsup-VE[g(Q(f),M(f))], (8) 

where Wf and Wp are the associated weights for the fetching 
and power costs, respectively, which reflect the tradeoff, and 
( 7 (Q, u) = d (Q) + Wff{u) + Wpp{Q, u) is the per-stage cost. 

We wish to find an optimal multicast scheduling policy to 
minimize the average system cost in (HJ. 

Problem 1 (System Cost Minimization Problem): 

1 ^ 

g* = minlimsup-VE[ 3 (Q(f),M(f))], (9) 

where ^ is a stationary unchain multicast scheduling policy 
and g* denotes the minimum average system cost achieved by 
the optimal policy g*. 

Problem |9] is an infinite horizon average cost MDP, which 
is well-known to be a difficult problem due to the curse of 
dimensionality. According to ll24l Theorem 8.4.5], for unichain 
infinite horizon average cost MDPs with finite state and action 
spaces, there always exists a deterministic stationary policy 
that is optimal. Note that, these requirements are satisfied by 
the MDP considered in our work. Therefore, it is sufficient to 
focus on the deterministic stationary policy space. 

B. Optimality Equation 

The optimal multicast scheduling policy p* can be obtained 
by solving the following Bellman equation. 

Lemma 1 (Bellman Equation): There exist a scalar 9 and a 
value function T4(-) satisfying 

0 + y(Q)= min {5(Q,7r)+E[y(Q')]}, VQ e Q, (10) 

uGM 

where the expectation is taken over the distribution of the 
request arrival A, and Q' = (Qm)meM with = 

min{l(M ^ m)Qm + in the uniform case; Q' = 

{Q'm,k)ni&M,k^K: with = min{l(M ^ m)Qm,k + 

Am,k, Nm,k} in the nonuniform case. Then, 9 = g* is the 


optimal value to Problem 1 for all initial state Q(l) € Q, and 
the optimal policy p* achieving g* is given by 

p*{Q)=i,Tgmm{g{Q,u)+E[V{Q')]}, VQ S Q. (11) 

ueM 

Proof: Please see Appendix A. ■ 

From the Bellman equation in (fTOl l. we can see that p* 
depends on the state Q through the value function y(-). 
Obtaining F(-) involves solving the Bellman equation for all 
Q, for which there is no closed-form solution in general lfT4l . 
Brute-force numerical solutions such as value iteration and 
policy iteration do not typically offer many design insights, 
and are usually impractical for implementation in practical 
systems due to the curse of dimensionality m. Therefore, it 
is desirable to study the structure of p*. 

To analyze the structure of p*, we also introduce the state- 
action cost function: 

J{Q,u)^g(Cl,u)+E[V{Q')]. (12) 

Note that J(Q,m) is related to the R.H.S. of the Bellman 
equation in ( fTOb . In particular, based on Lemma [T] the optimal 
policy p* can be expressed in terms of J(Q,m), i.e., 

F*(Q) = arg min J(Q, u), VQ e Q. (13) 

u^M 

In Sections IV and V, we shall analyze the structures of 
the optimal policies for the uniform and nonuniform cases, 
respectively, based on the properties of the value function 
V(Q) and the state-action cost function J(Q,tt). 

IV. Optimality Properties in Unieorm Case 

In this section, we consider the uniform case. We first show 
that the optimal policy has a switch structure. Then, we show 
that the switch curve is monotonically non-decreasing for the 
uniform case with two contents. 

A. Structure of Optimal Policy 

Problem 1 can be treated as the problem of scheduling 
a broadcast server to parallel queues with general random 
arrivals, channel conditions, and content sizes. Therefore, 
the structural analysis is more challenging than the existing 
structural analysis for simple queueing systems (see Section 
I for the detailed discussion). First, by RVIA and the special 
structures of the request queue dynamics, as well as the power 
and fetching costs, we have the following property of V(Q). 

Lemma 2 (Monotonicity of Value Function): In the uni¬ 
form case, for any Q^, G Q such that ^ Q^, we 
have V(Q2) > l/(Qi)0 

Proof: Please see Appendix B. ■ 

Then, based on Lemma |2] and the special properties of 
multicasting, we have the following property of J(Q,u). 

Lemma 3 (Monotonicity of State-Action Cost Function): 

In the uniform case, for any u,v G A4 and v u, 
J(Q, u) — J(Q, v) is monotonically non-increasing with Qu, 
i.e., 

J(Q -F e„,M) - J(Q -F e„,u) < J(Q,m) - J{Q,v), (14) 

'^The notation indicates component-wise >. 
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where denotes the 1 x M vector with all entries 0 except 
for a 1 in its rt-th entry. 

Proof: Please see Appendix C. ■ 

Note that, the property of J(Q, u) in Lemma|3]is similar to 
the diminishing-return property of submodular functions used 
in the existing structural analysis ll25l . Lemma |3] comes from 
the special structure introduced by multicasting and is key to 
analyze the optimality properties. Lemma [3] indicates that, if 
it is better to multicast content u than content v for some 
state Q (i.e., J(Q,rt) < J(Q,u)), then it is also better to 
multicast content u than v for state Q+e^ (i.e., J(Q+etj, u) < 
J(Q + e^, v)). This leads to the following switch structure of 
the optimal policy /r*. 

Theorem 1 (Switch Structure of Optimal Policy): The opti¬ 
mal policy pL* in the uniform case has a switch structure, i.e., 
for all M € Ad, we have 

^*(Q) = u,if Qu > s„(Q_u), (15) 

where the switch curve for content u is given by 

N A fmin5u(Q_„), if <S„(Q_u) 0 

loo, Otherwise 

with 5„(Q_.u) = {Qu|J(Q,u) < J(Q,u) Vv € M,v u}. 
Here, Q-^ = {Qm.)mGM,m^u denotes the request queue state 
vector corresponding to all other contents except content u. 

Proof: Please see Appendix D. ■ 

Remark 1: Theorem [T] indicates that, the request queue 
state space is divided into M regions corresponding to the 
M contents, and the optimal policy schedules a content for 
multicasting when the request queue state falls in the region 
corresponding to the content, as illustrated in Fig. |3(a)[ In 
addition, given Q_u, the scheduling for content u is of the 
threshold type, as illustrated in Fig. |3(b)| Specifically, if Qu > 
Su(Q-u), the BS schedules content u for multicasting and the 
request queue for content u is emptied; if Qu < Su iQ-u), 
the BS keeps on waiting to gather more requests for content u 
and the request queue for content u keeps on increasing. This 
indicates that, when Qu is small (i.e., the delay cost is small), 
it is not efficient to schedule content u, as a higher power cost 
(and a higher fetching cost if u ^ C) is consumed per request 
for content u\ when is large (i.e., the delay cost is large), 
it is more efficient to schedule content u, as the requests for 
content u is more urgent. This reveals the tradeoff between the 
delay cost and the power cost (and the fetching cost if u ^ C) 
for content u. 

Remark 2: From Theorem [T] we can see that cache place¬ 
ment does not affect the structural properties of the optimal 
policy. That is, the switch structure holds for any cache 
placement strategies. However, cache placement does affect 
the values of the switch curves of the optimal policy. The 
reason is that cache placement affects the tradeoff among the 
delay, power and fetching costs through affecting the fetching 
costs, and the switch curves of the optimal policy are adaptive 
to this tradeoff. The impacts of the fetching costs on the switch 
curves can be observed from Fig. |4(a)| 

Note that, although the exact values of the switch curves rely 
on the exact value of L(Q), the switch structural property only 



(a) Three-content case. (b) Two-content case. 

Fig. 3: Switch structure of optimal scheduling in the uniform case. 


0 1 2 3 4 5 6 

Ql 

(a) Uniform case: two-content, (b) Nonuniform case: two-content 

two-user. 

Fig. 4: Impacts of the fetching costs on switch curves in the uniform 
and nonuniform cases. 


relies on the monotonicity properties of L(Q) and J(Q,u). 
These structural properties can be used to reduce the compu¬ 
tational complexity in obtaining the optimal policy, without 
knowing the exact value of the switch curves. Specifically, 
from Theorem [T] we know that, for all Q G Q, 

/r*(Q)=u /i*(Q 4-e„) = u. (16) 

Therefore, computing the optimal policy p* requires con¬ 
ducting the minimization in the R.H.S. of (fTTT i for some Q 
only (instead of all Q S Q), which significantly reduces 
the computational complexity. Later, in Section VI, we shall 
design low complexity optimal algorithms based on (fThl l. 

B. Special Case: Two Contents 

Now, consider the special uniform case with two contents, 
i.e., M = 2. By Theorem [T] we can see that, for M = 2, 
either one of the two switch curves, i.e., si{Q 2 ) and S 2 {Qi), 
is sufficient to characterize the optimal policy. Moreover, 
by Lemma |2] and Lemma [3l si{Q 2 ) and S 2 (Qi) have the 
following property. 

Lemma 4 (Monotonicity of Switch Curve): For the uniform 
case with two contents, the switch curves si(Q 2 ) and S 2 (Qi) 
of the optimal policy are monotonically non-decreasing in <52 
and Ql, respectively. 

Proof: Please see Appendix E. ■ 

Fig. |3(b)| illustrates the monotonicity of the switch curve. We 
characterize the number of policies with monotonically non¬ 
decreasing switch curves in the following proposition. 
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(a) Whole space. 


(b) Fixed Q 2 ,i- 


(c) Fixed Qi, 2 - 


Fig. 6: Partial switch structure of optimal scheduling in the nonuniform case. Two-content, two-user case with ^ 2 , 2 (f) = 0,Vf. 
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Fig. 5: Illustration of 


Qki 

Qk 2 


Qm./c* 

-i -1 


Qi 


,K 


Proposition 1: For the uniform case with two contents, the 
number of the policies with monotonically non-decreasing 
switch curves is 

Proof: Please see Appendix F. ■ 

Table |II] shows that the space of possible optimal policies 
in the uniform case with two contents can be substantially 
reduced based on Lemma ID 


Queue 

Policy in 

Policy with Monotonically 

Size 

Definition 1 

Non-decreasing Switch Curve 

(1Vi,A2) 

2(iVi+l)(A/2 + l) 

/iVi+A'2+2\ 

V iVi+1 ) 

(4,4) 

3.36 X 10' 

252 

(8,8) 

2.42 X 10^"" 

48620 


TABLE II: Policy space size in the uniform case at M = 2. 


V. Optimality Properties in Nonunieorm Case 

In this section, we characterize the structure of the optimal 
policy for the nonuniform case. Note that, different from the 
uniform case, the power cost p(Q, u) in the nonuniform case 
also depends on the request queue state Q. Therefore, due to 
the coupling among the request queues, the structural analysis 
for the nonuniform case is more challenging than that for the 
uniform case. 

To analyze the structure of the optimal policy, we first 
introduce a new notation (see Fig. |5] for an example). For 


each m, define if and only if, 

/ > Qln,k^ if < max{A:|Q^ > 0}; 

I Qi,k = Qln,k^ otherwise. 

Define if and only if for all m. 

By RVIA and the special structures of the request queue 
dynamics, as well as the power and fetching costs, we can 
show the following property of L(Q). 

Lemma 5 (Partial Monotonicity of Value Function): In the 
nonuniform case, for any Q^, G Q such that Q' > Q\ 
we have L(Q^) > L(Q^). 

Proof: Please see Appendix G. ■ 

Then, based on Lemma |5] and the special properties of 
multicasting in the nonuniform channel case, we have the 
following property of J(Q, u). 

Lemma 6 (Partial Monotonicity of State-Action Cost Function): 
In the nonuniform case, for any u,v € A4, v u and 
Q + E„ fc > Q, we have 

J{Q-\-'Eu,k,u)-J{Cl + 'Eu,k,v) < J{Q,u)-J{Q,v), (17) 

where E„ ^ denotes the MxK matrix with all entries 0 except 
for a 1 in its (u, fc)-th entry. 

Proof: Please see Appendix H. ■ 

Lemma | 6 ] indicates that if it is better to multicast content u 
than content v for some state Q (i.e., J(Q, u) < J{Q, v)) and 
Q+Eu_fc > Q, then it is also better to multicast content u than 
V for state Q-FE„,fe (i.e., J(Q-FE„,fe,u) < J(Q-f E„,fc, ti))- 
Thus, we have the following theorem. 

Theorem 2 (Partial Switch Structure of Optimal Policy): 

The optimal policy /i* in the nonuniform case has a partial 
switch structure, i.e., for all u G Ad and k € 1C, we have 

T (Q) — if ^ Su^k{C^—u,—k') and (18) 
condition (a) or (b) holds, 

where condition (a) is fc < Q„), condition (b) is k > 

k^{k,Qu) and Su,k{Q-u,-k) > 0, and the switch curve for 
content-user pair (u, k) is given by 

, A j niin), if 0 

^u.k\^ — u. — k) — \ . . 

I cx), Otherwise 

with Su,k{Cl-u,-k) = {Qu,k\J{Q,u) < J(Q, v) \/v € 

7 ^ u}. Here, (^ — u, — k — {Qm,i)m£A4,i£fC,{m,i)^{u,k) 
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denotes the request queue state matrix corresponding to all 
the other content-user pairs except the content-user pair (u, k), 
and k^k, Q„) = m.a-x.{i\Qu,i > 0, i 7 ^ k}. 

Proof: Please see Appendix I. ■ 

Remark 3: From Theorem |2] we can see that, the structure 
of the optimal policy in the nonuniform case is very similar to 
the one in the uniform case, as illustrated in Fig. | 6 ] The only 
difference is that, the structural property for k > fct(A:,Q„) 
and s(Q_u^_/c) = 0 depends on the specific channel asym¬ 
metry among the users and is still not known in general, as 
illustrated in the dashed box of Fig. | 6 (b)[ Similar arguments 
on the tradeoff and the impacts of cache placement for the 
uniform case also hold for the nonuniform case. 

Similarly, note that, the partial switch structural property 
only relies on the partial monotonicity properties of 1 ^(Q) 
and J(Q,u). These structural properties can also be used to 
reduce the computational complexity in obtaining the optimal 
policy for the nonuniform case, without knowing the exact 
value of the switch curves. Specifically, from Theorem |2l we 
know that, for all Q € Q, 

p*(Q) = u and Q + Eu,fe > Q ^ p*(Q + E„,fc) = m. (19) 

Therefore, computing the optimal policy p* requires con¬ 
ducting the minimization in the R.H.S. of for some 
Q only (instead of all Q), which significantly reduces the 
computational complexity. Based on (fT9l l. we shall design low 
complexity optimal algorithms in Section VI. 

VI. Low Complexity Optimal Algorithms 
In this section, we propose two low complexity optimal 
algorithms for both the uniform and nonuniform cases, by 
exploiting the structural properties of the optimal policy in 
Theorems [I]and| 2 ] 

A. Structured Relative Value Iteration Algorithm 

The optimal policy /r* in (fTTT i can be computed using RVIA, 
which is a commonly used numerical method for solving 
infinite horizon average cost MDPs based on the Bellman 
equation M Chapter 4.3] and is detailed in Appendix B. 
We first summarize the standard RVIA for computing fj,* in 
Algorithm [T] It is shown in lfT4l Proposition 4.3.2] that under 
Algorithm [T] for any {Vb(Q)}, we have Vn{Q) —> C(Q) 
for all Q G Q, as n ^ 00 , where {y(Q}) satisfies the 
Bellman equation in (fTOl i. Given {V(Q}), we can obtain the 
optimal policy p* by (fTTIi . Note that, in Step| 2 ]of the standard 
RVIA in Algorithm [T] a brute-force minimization over M 
actions needs to be computed for each Q € Q, which can 
be computationally expensive when M is large. 

By exploiting the structural properties of the optimal policy 
in Theorems [ 1 ] and m we modify (l20l i in Step |2] (value update) 
of Algorithm [T] to reduce computational complexity. The mod¬ 
ified step is given by Algorithm |2] (structured value update). 
Replacing (i20l i in Step|2]of Algorithm [T] with Algorithmic] we 
obtain a low complexity modified RVIA, which is referred to 
as the structured relative value iteration algorithm (SRVIA). 
From the proofs of Theorems [H and ID we can easily see 
that under SRVIA, for any {Vb(Q)}, V„{Q) —> V(Q) for 


Algorithm 1 Relative Value Iteration Algorithm 

1 : Set Vb(Q) = 0 for all Q € Q, select reference state 
and set n = 0. 

2: (Value Update) For each state Q S Q, compute U„+i (Q): 
K+i(Q) = min {g{Q,u) +E[VniQ')]} , (20) 

uGM 

where Q' is defined in Lemma [T] 

3: For each state Q G Q, normalize 14,-i-i(Q): 

K+i(Q) ^ K+i(Q) - K+i(Q§)- 

4: Go to Step m 


Algorithm 2 Structured Value Update 

1 : if 3u G A4, such that p.^(Q — e„) = u (uniform case), 
or 3u € A4 and k G 1C, such that /i* (Q — E„^fe) = u and 
Q > Q — E„_fc (nonuniform case), then 

Ain(Q) = W, 

U„+i(Q) = g(Q,ii)+E[U„(Q')]. 

2 : else 

U„+i(Q) = min {g(Q,u)+E[U„(Q')]}, 

ueM 

P-niQ) = arg min {g(Q,M) -f E[14(Q')]} . 
ueM 

3: end if 


Algorithm 3 Policy Iteration Algorithm 

1 : Set ^o(Q) = 1 foi' all Q G Q, select reference state Q§, 
and set n = 0. 

2: (Policy Evaluation) Given policy /r*, compute the corre¬ 
sponding average cost and value function U„(Q) from 
the linear system of equation^ 

0„ + 14(Q) = 5(Q,<(Q)) + E[K(Q')], VQg Q 
14(Q§) = 0 

( 21 ) 

where Q' is defined in Lemma [T] 

3: (Policy Update) Obtain a new policy where for each 

Q G Q, is such that; 

/r*+i(Q) =argmin{5(Q,'u)-fE[U„(Q')]}. (22) 

uGA4 

4: Go to Step |2] until Pn+l = fin- 


all Q G Q, as n 00 . Similarly, given U(Q), we can obtain 
ji* in (fTTl i. In other words, SRVIA is an optimal algorithm. 

B. Structured Policy Iteration Algorithm 

The optimal policy g* in (fTTI) can also be computed using 
the policy iteration algorithm (PIA), which is another widely 
used method for solving infinite horizon average cost MDPs 

^The solution to E) can be obtained directly using Gaussian elimination 
or iteratively using the relative value iteration method da 
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Algorithm 4 Structured Policy Update 

1: if G A4, such that /i*_|_i(Q — e^) = u (uniform case), 
or 3u & M and k G 1C, such that /i*_|_i(Q — = u 

and 

Q > Q — 'Eu,k (nonuniform case), then 

mJ1+i(Q) =u. 

2 : else 

fJ-l+iiQ) = arg min {g(Q, m) + E [En(Q')]} ■ 

3: end if 


ll24l Chapter 8.6]. We summarize PI A for computing /i* in 
Algorithm [3] According to ll2^ Theorem 8.6.6], Algorithm 0 
converges in a finite number of iterations to the optimal 
policy /i* in (fTTl i. In other words, there exists a finite n such 
that /i* = fi* for all n > n. Note that, in Step [3 of the 
standard PIA in Algorithmic a brute-force minimization over 
M actions needs to be computed for each Q G Q, which can 
be computationally complex when M is large. 

By exploiting the structural properties of the optimal policy 
in Theorems [T] and |2] we modify (I22I) in Step [2 (policy 
update) of Algorithm[Cto reduce its computational complexity. 
The modified step is given by Algorithm |4] (structured policy 
update). Replacing (i22l) in Step [2 of Algorithm [2 with Algo¬ 
rithm m we obtain a low complexity PIA, which is referred 
to as the structured policy iteration algorithm (SPIA). From 
li24l Chapter 8.11.2], we can see that SPIA also converges in 
a finite number of iterations to the optimal policy p,* in (fTTl i 
and hence is an optimal algorithm. 

C. Complexity Comparison 

We compare the computational complexity of the proposed 
structured optimal algorithms (SRVIA and SPIA) with the 
standard optimal algorithms (RVIA and PIA) for each iter¬ 
ation, as illustrated in Table [Hll Specifically, in the structured 
value update step (Algorithm |2) and the structured policy 
update step (Algorithm |4|, if the condition is satisfied for 
a certain queue state, then we do not need to perform the 
corresponding minimization over M actions. This leads to a 
computational saving of 0{M\Q\) li26ll . There are in total |Q| 
states. Thus, for each iteration, the computational complexity 
reduction of the structured value/policy update is 0(M|Qp). 
From Table |III] we can see that, although the proposed struc¬ 
tured optimal algorithms suffer from the exponential growth 
of the state space, the computational complexity reduction 
also grows exponentially with the state space. Therefore, the 
computational complexity reduction of the proposed structured 
optimal algorithms is remarkable, considering that the optimal¬ 
ity is not sacrificed. 

Note that the two proposed low-complexity optimal algo¬ 
rithms still suffer from the curse of dimensionality, i.e., the 
exponential dependence of the state space lfT4ll . This curse 
of dimensionality comes from the complex coupling structure 
of the request queue model, and is embedded in the optimal 
control design for the considered MDR To the best of our 



Complexity of 
Standard Algs. 

Complexity of Proposed 
Structured Algs. 

Complexity 

Reduction 

Value Update 

0(M|QF) 

o(M|SF) 

o(M|ep) 

Policy Update 

0(M|SF) 

0(M|SF) 

0(M|SP) 


TABLE III: Complexity comparison between the proposed and stan¬ 
dard optimal algorithms in each iteration. 


knowledge, unless for very special cases, it is not possible 
to break the curse of dimensionality without any loss of 
optimality. 

VII. Low Complexity Suboptimal Solution 

To further reduce the complexity of the proposed structured 
optimal algorithms and relieve the curse of dimensionality, we 
would like to develop low-complexity suboptimal solutions. 
Note that the structural properties of the optimal policy may 
be one key factor that leads to good performance. Thus, in this 
section, we focus on the design of suboptimal solutions which 
can maintain the switch structures. Specifically, based on a 
randomized base policy, we first propose a low complexity 
suboptimal deterministic policy using approximate dynamic 
programming m, which has better performance than the 
randomized base policy and possesses similar structural prop¬ 
erties to the optimal policy. Then, based on these structural 
properties, we develop a low complexity structured algorithm 
to compute the proposed policy. Note that, with abuse of 
notation, in this section, we also use Qm and Qm to represent 
Qm and Qm in the nonuniform case. 

A. Low Complexity Suboptimal Policy 

The switch structural properties of the optimal policy stem 
from the monotonicity property of the value function. There¬ 
fore, to maintain the switch structures in designing a subop¬ 
timal solution, we consider a value function decomposition 
method that can preserve the structural properties of the value 
function. Based on this decomposition, we propose a low 
complexity suboptimal deterministic policy, which will be 
shown to possess similar structural properties to the optimal 
policy. We first introduce a randomized base policy. 

Definition 2 (Randomized Base Policy): A randomized 
base policy for the multicast scheduling control p is given by 
a distribution on the multicast scheduling action space Ai. 

We restrict our attention to randomized unichain base poli¬ 
cies. Denote 9 and {V(Q)} as the average cost and the 
value function under a randomized unichain based policy fi, 
respectively. By iMl Proposition 4.2.2] and following the 
proof of Lemma [T] there exists (0, {U(Q)}) satisfying: 

e + V{Q)^Ef^ [p(Q,n)] 

+ ^ E^[Pr[Q'|Q,w]]l/(Q'), VQgQ, (23) 

Q'6Q 

where g{Q,u) and Pr[Q'|Q,M] are given in ((2) and dTTl) . 
respectively. In the following lemma, we show that 14(Q) has 
a separable structure. 

Lemma 7 (Separable Structure ofV{Q)): Given any ran¬ 
domized unichain base policy fi, the value function {E(Q)} 
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in ( | 2 ^ can be expressed as ^(Q) = J2meM ^rniQm), where 
{Vm(Qm)} satisfies: 

dm + VmiQm) = E'' [gm{Qm,u)\ 

+ ^ W[Pv[Q'JQm,u]]VmiQm),^Qm€Qm, (24) 

Q'rr.<^Qm 

for all m S M. Here, 9m and Vm(Qm) denote the per- 
content average cost and value function under fi, respectively, 
gm(Qm,u) = Qm + wff (u) + Wpl(u = m)p(m) in the 
uniform case, gm(Qm,u) = Y^k^K Qm.,k+Wff{u)+Wpl(u = 
m)p(m,k^(Qm,u)) with ki(Qm,u) = max{/c|(5m,/c > 0} in 
the nonuniform case, and Pr[( 5 ^|( 5 m, u] — Pr[Qm(f + 1) = 
Qm\Qm(t) = Qm,u(t) = m]. 

Proof: Please see Appendix J. ■ 

To alleviate the curse of dimensionality, we approximate the 
value function V (Q) in (fTTIi by V (Q), i.e., 

^(Q)«T>(Q)= ^ T4n(Qm), (25) 

where {Vm(Qm)} is given by the per-content fixed point equa¬ 
tion in daHi. Then, we obtain the following low complexity 
deterministic policy p*: 


A*(Q) = min 

uGA4 


g{Q,u) + X! 


mGM 


^rniQm)) J ) 

VQ G Q. (26) 


Remark 4: To obtain p* in (l26l l. we only need to compute 
{Vm(Qm)} (a total of 0(Ym I2m|) values) via solving 
for all m. The computational complexity is much lower than 
computing {V (Q)} (a total of Oijlm \ Qm\) values) via solving 
(ITOl i in obtaining p* in (fTTT i. 

Remark 5: Note that, the value function decomposition 
method adopted here is different from most existing approx¬ 
imate approaches Eol, ea. Our approach does not rely on 
choices of specific basis functions. Moreover, our approach 
can maintain similar switch structural properties to the optimal 
policy, which will be shown in Theorem |4l 


B. Properties of Suboptimal Policy 

1) Performance comparison: The proposed deterministic 
policy p* always achieves better performance than the ran¬ 
domized unichain base policy p, as summarized in the fol¬ 
lowing theorem. 

Theorem 3 (Performance Improvement): If Pr[Q'|Q, u] 
Pr[Q'|Q,u'] for any u u' and Q G Q, then we have 
0*(Q)<0 for all Q G Q, where 0*(Q) is the average system 
cost under the proposed solution starting from Q and 9 is 
the average system cost under any randomized base policy, 
respectively. 

Proof: This result follows directly from lIZTil . ■ 

2) Structural properties: We first introduce the state-action 
cost function for p*: 


J(Q,u)4g(Q,w)+ ^ E\VmiQm)) 


m^A4 


(27) 


Note that J(Q,u) is related to the R.H.S. of ( l26l l. Along 
the lines of the structural analysis for the optimal policy in 
Sections IV and V, we can show that the proposed determin¬ 
istic low complexity suboptimal policy has similar structural 
properties to those in Theorems [T] and |2] This similarity may 
be one key reason for the good performance of the proposed 
suboptimal solution, which will be shown in the numerical 
section. 

Theorem 4 (Structural Properties of Suboptimal Policy p*): 
Under any randomized base unichain policy p, the structural 
properties of the corresponding deterministic policy p* are as 
follows. 

1) In the uniform case, p* has a switch structure, i.e., for 
all m G M., we have 

/t*(Q) = u,if > s„(Q_„), (28) 

where the switch curve for content u is given by 

. s A fmin5„(Q_„), if 5„(Q_„) 7 ^ 0 

—n) — \ . 

I(X), Otherwise 

with 5„(Q_„) = {Qu\JiQ,u) < J{Q,v) yv,v 7 ^ u}. Here, 
Q_„ is defined in Theorem [T] 

2) In the nonuniform case, p* has a partial switch structure, 
i.e., for all u G Ad and k € 1C, we have 


A (Q) — Qu,k ^ ^u,k(^—u,—k') and (29) 
condition (a) or (b) holds, 

where condition (a) is A: < k^(k,Qu), condition (b) is fc > 
kf(k,Qu) and Su,k(Q-u,-k) > 0, and the switch curve for 
content-user pair (u, k) is given by 

.. \ ^ J nain), if Su^k(^—u,—k') f- 0 

I 00, Otherwise 

with (Q—u,—fc) — '{Q?i,,fcI^ VvjV ^ u\. 

Here, k^(k,Qu) and Q-u,-k are defined in Theorem|2l 

Proof: Please see Appendix K. ■ 

Similarly, Theorem |4] indicates the following results. 

1) In the uniform case, for all Q G Q, we have 

p*{Q) = u => p*{Q + Bu) = u. (30) 

2) In the nonuniform case, for all Q G Q, we have 
/i*(Q) = u and Q-f-Eu,fe > Q ^ p*{Q + E,u,k) = u. (31) 


C. Structured Suboptimal Algorithm 

By making use of the relationship between p and p* and 
the structural properties of p* in Theorem |4] we can develop 
a low complexity algorithm to obtain p* in (l26l l. which is 
summarized in Algorithm |5] We refer to Algorithm |5] as the 
structured suboptimal algorithm (SSA). 

Now we show that SSA has a significantly lower compu¬ 
tational complexity than the two optimal algorithms proposed 
in Section VI, i.e., SRVIA and SPIA. First, we compare the 
complexity of SSA and SRVIA. If we compute {Vm(Qm)} 

®The solution to j24> can be obtained directly using Gaussian elimination 
or iteratively using the relative value iteration method (m. 
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Fig. 7: Average costs versus Wp and Wf in the uniform case at M = 3, \C\ — 2, K — 2, and a = 0.75. 



(a) Average system cost (b) Average delay cost. (c) Average fetching cost. (d) Average power cost. 

Fig. 8: Average costs versus Wp and Wf in the nonuniform case at M = 3, \C\ = 2, K = 2, and a = 0.75. 


Algorithm 5 Structured Suboptimal Algorithm 

1: Given a randomized base unichain policy /t, compute the 
per-content value function {Vm(Qm)} for all m € Ad by 
solving the corresponding linear system of equations in 

(ESS 

2: Obtain the proposed policy fi*, where for each Q G Q, 
/i*(Q) is such that: 

if G Ad, such that /t*(Q — e„) = u (uniform case), 

or G Ad and k G 1C, such that /t*(Q — E„ fc) = u and 

Q > Q — E„ fc (nonuniform case), then 

A*(Q) = u. 


else 

endif 


Compute /r*(Q) by (|26]l. 


in Step [T] of SSA using the relative value iteration method, 
then the numbers of iterations required for Step[T]of SSA and 
SRVIA are comparable. However, as illustrated in Remark @1 
for each iteration, the number of value functions required to be 
updated in Step[T]of SSA is much smaller than that in SRVIA. 
In addition, the number of optimizations required to be solved 
in Step [U of SSA is comparable to that in each iteration of 
SRVIA. Thus, SSA has a much lower computational complex¬ 
ity than SRVIA. Next, we compare the complexity of SSA and 
SPIA. SSA is similar to one iteration of SPIA. As illustrated 
in Remark @1 the number of value functions required to be 
updated in Step [T] of SSA is much smaller than that in each 
iteraion of the policy evaluation step of SRVIA. In addition, 
the number of optimizations required to be solved in Step [T| of 
SSA is comparable to that in each iteration of the structured 


policy update step of SPIA. Thus, SSA has a much lower 
computational complexity than SPIA. 

VIII. Numerical results and discussion 

In this section, we evaluate the performance of the proposed 
optimal and suboptimal solutions via numerical examples. 
In the simulations, we consider that in each slot, each user 
requests one content, which is content m with probability Pm- 
We assume that {Pm} follows a (normalized) Zipf distribution 
with parameter a 128]. We consider that each content is of the 
same size, and the BS stores the most popular contents. We 
set c(m) = 3 for all m. In addition, for all m, in the uniform 
case, we set p{m, k) = 2 for all k, and in the nonuniform case, 
we set p{m, k) = 2 for k = 1, - ■ ■ , K/2 and p(m, 2) = 4 for 
k = K/2 + \,--- ,K. 

First, we compare the average costs of the proposed optimal 
and suboptimal policies with three baseline policies, i.e., a 
randomized base policy in Definition |2l the longest-queue-first 
policy in iflbll . and a myopic policy ll20ll . In particular, in each 
slot, the randomized base policy chooses one content randomly 
for multicasting according to the distribution {Pm} on Ad and 
the longest-queue-first policy schedules the content with the 
longest request queue for multicasting. In each slot, the myopic 
policy chooses the multicast scheduling action that minimizes 
a cost function C(Q,m), i.e., u{t) = argmin„g_A 4 C'(Q, u), 
where 

C(Q,m) = Wff{u)+Wpp{Q,u) — d{Q,u),yQ G Q,u G Ad, 

with d{Q,u) = Qu in the uniform case and d{Q,u) = 
Qu,k in the nonuniform case. This policy determines the 
scheduling action myopically, without accurately considering 
the impact of the action on the future costs. Note that, this 
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Fig. 9: Average system cost versus the Zipf parameter a in the 
uniform and nonuniform cases at M = 30, \C\ = 13, K = 30, 
and Wp = Wf = 5. 


myopic policy can also be treated as an approximate solution 
to the considered MDP through approximating F(Q) in the 
Bellman equation with J2m Qrn (uniform case) or J2m,k Qm,k 
(nonuniform case). 

Fig. |2] and Fig. [8] illustrate the average system, delay, power 
and fetching costs versus the weights of the power and fetching 
costs (i.e., Wp and wj) in the uniform and nonuniform cases, 
respectively. It can be seen that the average system costs of 
the proposed optimal and suboptimal policies are very close 
to each other and are lower than those of the longest-queue- 
first policy and the myopic policy. The reason is that the 
proposed two policies can make foresighted decisions by better 
utilizing system state information and balancing the current 
cost and the futures costs. Moreover, we can observe that for 
the optimal and suboptimal policies, in the uniform case, the 
average delay cost increases with Wf and does not change 
with Wp, and the average fetching cost decreases with Wf', in 
the nonuniform case, the average delay cost increases with 
Wf and Wp, the average power cost decreases with Wp, and 
the average fetching cost decreases with Wf. This reveals the 
tradeoff among the delay, power, and fetching costs of the 
optimal and suboptimal policies. 

Fig. |9] illustrates the average system cost versus the Zipf 
parameter a in the uniform and nonuniform cases. The a 
parameter determines the “peakiness” of the content popularity 
distribution, i.e., a large a indicates that a small amount of 
contents account for the majority of content requests. It can 
be seen that with the increase of a, the average system cost of 
the proposed suboptimal policy decreases and the performance 
gains over the three baseline policies increase. This indicates 
that the proposed suboptimal policy can utilize caching more 
effectively as the content popularity distribution gets steeper. 

Fig. [To] and Fig. [TT] illustrate the average system cost and 
the average system cost per user versus the number of users 
K in the uniform and nonuniform cases, respectively. We can 
observe that when the average request arrival rate increases (as 
K increases), the average system costs per user of all policies 
decrease. This reveals the benefit of the multicast transmission. 

Next, we compare the computational complexity of the 
two standard optimal algorithms (RVIA and PIA), the two 
proposed low-complexity optimal algorithms (SRVIA and 



(a) Average system cost. (b) Average system cost per user. 

Fig. 10: Average system cost and system cost per user versus number 
of users K in the uniform cases at M = 30, |C| = 13, A = 30, 
Wp = Wf = 5 and a = 0.75. 
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Number of Users Number of Users 

(a) Average system cost. (b) Average system cost per user. 

Fig. 11: Average system cost and system cost per user versus number 
of users K in the nonuniform cases at M = 30, |C| = 13, A = 30, 
Wp = Wf = 5 and a = 0.75. 


SPIA), and the proposed low-complexity suboptimal algorithm 
(SSA) in Table |IV] for the uniform and nonuniform cases. 
It can be seen that SRVIA and SPIA have much lower 
computational complexity than RVIA and PIA, respectively 
(with reductions of over 25% in computation time). Note 
that, the computation times of the four optimal algorithms 
and the computational reductions of the proposed SRVIA and 
SPIA have the same order of growth. Therefore, although 
the proposed low-complexity optimal algorithms suffer from 
the curse of dimensionality, their computational complexity 
reductions are remarkable. Moreover, we can observe that SSA 
has a significantly lower computational complexity than all the 
optimal algorithms. These verify the discussions in Sections 
VI and VII. 



M 

RVIA 

SRVIA 

PIA 

SPIA 

SSA 

Uniform 

2 

0.0138 

0.0085 

0.0119 

0.0077 

0.0016 

3 

0.577 

0.428 

0.614 

0.476 

0.0063 

4 

23.42 

17.31 

20.48 

16.33 

0.0828 

Nonuniform 

2 

0.380 

0.295 

0.419 

0.337 

0.0076 

3 

15.55 

12.37 

18.45 

14.41 

0.305 

4 

2315.8 

1783.2 

2473.5 

1878.8 

32.57 


TABLE IV: Average Matlab computation time (sec) for different 
algorithms in the uniform and nonuniform cases. |C| = 1, Wp = 
Wf = l, K = 2, a = 0.75, Nm = 10 for all m and Nm,k = 4 for 
all m, k. 
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IX. Extension to Markov-Modulated Request 
Arrivals 


In this section, we extent the structural analysis for the 
i.i.d. request arrivals to the Markov-modulated request arrivals. 
Specihcally, we assume that for each m and k, the request 
arrival ^(f)} evolves according to an ergodic hnite-state 
Markov chain with the transition probability Pr[Am + 
l)|Am,fc(f)]- In this case, the system state consists of the re¬ 
quest queue state Q and the request arrival state A. We dehne 
the stationary multicast scheduling policy /i as a mapping 
from system state space Q x A. to the multicast scheduling 
action space Ai and formulate the corresponding system cost 
minimization problem. Similar to Lemma [T] we have the 
following Bellman equation: 


0 + 1/(Q, A) = min {^(Q, u) -f E A')] } , VQ,. 


where the expectation is taken over the distributions of A and 
A', and Q' is dehned in Lemma [1] Lollowing the analysis 
in Sections IV and V, we can show that the optimal policy 
jl* for the Markov-modulated request arrival model possesses 
similar structural properties to the optimal policy /r* for the 
i.i.d. request arrival model. 

Theorem 5 (Structural Properties of Optimal Policy ft*): 
Lor Markov-modulated request arrivals, the structural 
properties of the optimal policy ft* are as follows. 

1) In the uniform case, ft* has a switch structure, i.e., for 
all TO G Ad, we have 


A) = u,if > s„(Q_„, A). (32) 

2) In the nonuniform case, ft* has a partial switch structure, 
i.e., for all u e Ad and k € 1C, we have 


ft (Q, A) — u, if Qu,k ^ (Q—u,—fc: A.) (33) 

and condition (a) or (b) holds, 

where condition (a) is fc < kf{k,Qu), condition (b) is fc > 
k^(k,Qu) and Su,k{f^—u,—k^ A) 0. 

The switch curves s„(Q_„,A) and A) in ( l32l i 

and ( l33t are dehned in a similar manner to the switch curves 
in Theorems [T] and 121 respectively. 

Similarly, Theorem |5] implies the following results. 

1) In the uniform case, for all Q and A, we have 

/i*(Q,A) = w ^ /l*(Q + e„,A) = TO (34) 

2) In the nonuniform case, for all Q and A, we have 

p*(Q, A) = u, Q -f ^ Q => M*(Q + A) = u. 

(35) 

As in Sections VI and VII, the structural properties in (l34l l and 
(l35l l can also be utilized to design low-complexity optimal and 
suboptimal algorithms. 


X. Conclusion 

In this paper, we consider the optimal dynamic multicast 
scheduling to jointly minimize the average delay, power, 
and fetching costs for cache-enabled content-centric wireless 
networks. We formulate this stochastic optimization problem 
as an inhnite horizon average cost MDR We show that the 


optimal policy has a switch structure in the uniform case and 
a partial switch structure in the nonuniform case. Moreover, 
in the uniform case with two contents, we show that the 
switch curve is monotonically non-decreasing. Based on these 
structural results, we propose two low-complexity optimal 
algorithms. Motivated by the switch structures of the optimal 
policy, to further reduce the complexity, we also propose a 
low-complexity suboptimal policy, which has similar structural 
properties to the optimal policy, and develop a low-complexity 
algorithm to compute this policy. These analytical results hold 
for both i.i.d. request arrival and Markov-modulated request 
arrival models. 


Appendix A: Prooe oe Lemma[T] 

By Proposition 4.2.5 in llT4ll . the Weak Accessibly (WA) 
condition holds for unichain policies. Thus, by Proposition 
4.2.3 and Proposition 4.2.1 in IIT4l . the optimal system cost of 
the MDP in Problem |9] is the same for all initial states and the 
solution (6>, 1/(Q)) to the following Bellman equation exists. 

0 + V(Q)= min | 5 (Q,u)+ V Pr[Q'|Q,u]y(Q') I 
1 Q'es ) 

VQ G Q. (36) 

The transition probability is given by 

Pr[Q'|Q,u] (37) 

= Pr[Q(f + 1) = Q^IQ(f) = Q,u{t) = u] 

=E [Pr [Q{t + 1) = Q'IQ(f) = Q, u{t) = u, A(t) = A]], 


where Pr [Q{t + 1) = Q'|Q(f) = Q, u{t) = u, A{t) = A] 

1, if Q' satishes (O or (IHi 
0, otherwise 


By substituting (iJTl l into (l36l l. we have (fTOl l. which completes 
the proof. 


Appendix B: Prooe oe Lemma[2] 

We prove Lemma |2] using RVIA and induction. 

Lirst, we introduce RVIA M Chapter 4.3]. Lor each state 
Q G Q, let lAt(Q) be the value function in the nth iteration, 
where n = 0,1, • • •. Dehne 

Vn-|-l(Q, Un) = g{Q, Un) + E[14(Q')], (38) 

where 5(Q,n„) = + Wpp(un) + Wff{un) 

and Q' = {Q'm)m(iM with Q'^ = min{l(n„ ^ 

m)Q 

m + Am, Njn} in the uniform case; g{Q, Un) = 
Y.m,kQra,k + Wpp{Un,kf {Cl,Un)) + Wff{Un) with 

A:^(Q,'Uyj) = ni<xx\^k\Qk ^ 0}, Q ” 

and Qm.fc = min{l('u„ ^ m)Q^^k + Am,fe,Wm,fe} in the 

nonuniform case. 

Note that J„+i (Q , Un) is related to the R.H.S of the 
Bellman equation in (fTOl i. We refer to J„+i(Q,n„) as the 
state-action cost function in the nth iteration. Lor each Q, 
RVIA calculates 14,-i-i(Q) according to 

Vn+i(Q) ~ min A^-|_i (Q, minAyi+i (Q§,n„), Vn (39) 

Un Un 
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where J„_|_i(Q,u„) is given by ( 1381 ) and € Q is some 
fixed state. Under any initialization of Vb(Q), the generated 
sequence {14,(Q)} converges to V (Q) lfT4l Proposition 4.3.2], 
i.e., 

lim 14(Q)=U(Q), VQe Q, (40) 

n—^■oo 

where V (Q) satisfies the Bellman equation in ( fTOb . Let (Q) 
denote the control that attains the minimum of the first term 
in (|39] | in the nth iteration for all Q, i.e., 

^*(Q) = argmin J„+i(Q,u„), VQ e Q. (41) 

Un 

We refer to /r* as the optimal policy for the nth iteration. 

Next, we prove Lemma |2] through mathematical induction 
using RVIA. Denote 4 and 4 

To prove Lemma |2] it is equivalent to show that for any 
e Q such that ^ Q\ 

14(Q") > 14(Q^), (42) 

holds for all n = 0,1, • • •. First, we initialize Vo(Q) = 0 for 
all Q S Q. Thus, we have Vo(Q^) = Vb(Q^) = 0, i.e., (l42l l 
holds for n = 0. Assume that (l42l l holds for some n > 0. We 
will prove that (l42l l also holds for n + 1. By ( |39] |. we have 

14+i(Q^) = Jn+i (Q\AtJj(Q^)) - min J„+i(Q§,u„) 

Un 

< Jn+i (Q\Ai;!i(Q^)) - min J„+i(Q§,'u„) 

Un 

E[U4Qi')] +Y.QI + 

m 

- min J„+i(Q§,n„), (43) 

Un 

where (a) follows from the optimality of /i*(Q^) for 
in the nth iteration, (6) directly follows from (l3^ and 
= (Q^meM with = min{l(^;(Q2) ^ m)Q^ + 
4m,,-^m}- By (EE) and (|39] |. we also have 

K+i(Q^) = Jn+i (Q^,Mn(Q^)) - min J„+i(Q^u„) 

Un 

= E[U„(Q2')] + ^ + wM(Q^)) + 

m 

- min J„+i(Q^,n„), (44) 

Un 

where Q^' = (Q^)meAi with = min{l(/r* (Q^) 4 
m)Qm + 4m, A4,}- Then, we compare (I43I) and (l44l) term 
by term. Due to ^ Q\ we have X^m Qh > Em Qln and 
Q2' ^ Qi', implying that E[14(Q^')] > E[14(Q^')] by the 
induction hypothesis. Thus, we have 14+i(Q^) > ^+i(Q^), 
i.e., (l42l l holds for n + 1. Therefore, by induction, we can show 
that (l42T i holds for any n. By taking limits on both sides of 
(I 42 I 1 and by (|40] |. we complete the proof of Lemma |2] 


where Q = (Qm)me 7 K, Q*' = (Qm)meAi, * = 1,2,3,4 with 


Qi 

Qt 

Qi 

Qi 


min{l(zz A 'm)Qm + 4m, Wm}, 

(46a) 

min{l(z; A m)Qm + 4m, Wm}, 

(46b) 

J min{4„,iVii} if m = zt 

{ minjQm + 4m,iVm} otherwise 

, (46c) 

f min{Q„-f 1 + 4„,iVu} 

if m = zt 

{ min{l(z; 7 ^ m)Qm + 4m, Wm} 

otherwise ’ 


(46d) 


and (c) is due to 

ff(Q, u) - g{Q, v) - 5 (Q + e„, u) + g(Q + e„, v) 
AT. >]-X Q m + Wpp{v) 

m m 

+ Wff{v)^ - ( ^ Qm + 1 + Wpp{u) + Wff{u)j 

m 

+ (^^Qm + ^ + Wpp{v)+Wff{v)^ = 0 . (47) 

m 

To prove Lemma [3l it remains to show that the R.H.S. of 
(I 45 I 1 is nonnegative. By comparing (146 al) with (I46cl) . we can 
see that Qm = Qm for all m, i.e., . Thus, we 

have E[U(Q^')] = E[U(Q^')]. By comparing (I46bl) with 
( I46dl ). we can see that > Q^' and Q^ = Q^ for 
all m 7^ u, i.e., ^ . Thus, by Lemma |2l we 

have E[U(Q^ )] > E[U(Q^ )]. Therefore, by (I45I) . we have 
J(Q + e„,u) - J(Q + e„,u) < J(Q,m) - J{Q,v). We 
complete the proof of Lemma [3] 

Appendix D: Proof of Theorem[I] 

Consider content u € A4 and state Q = {Qm)mGM where 
Qu = s„(Q_„). Note that, if s„(Q_„) = 00, (fTsI) always 
holds. Therefore, in the following, we only consider that 
s„(Q_„) < 00. According to the definition of s„(Q_„) in 
Theorem[Tl we can see that J(Q, u) < J(Q, v) for all v G Ai 
and V u. Thus, it is optimal to multicast content u for state 
Q, i.e., /r*(Q) = u. Consider another state Q' = (Qm)meM 
where QJ^ > Q„ and Q4 = Qm for all m 4 ri. To prove 
Theorem [T] it is equivalent to show that p,*(Q') = u. By 
Lemma [3 for all z; G Al and v u, we have 

J(Q', u) - J(Q', v) < J(Q, u) - J(Q, v) < 0. (48) 

Thus, it is optimal to multicast content u for Q'. We complete 
the proof of Theorem [T] 


Appendix C; Prooe oeLemma[3] 

By ([T 2 I 1 , we have 

J(Q,zt) - J(Q,z;) - J(Q + Ou, u) + J(Q + 
=E[U(Qi')] + p(Q, u) - E[U(Q2')] - g(Q, v) 

—E[U(Q^ )] — p(Q + e„, u) + E[U(Q"^ )] + (i(Q + Ou,z;) 
= E[U(Qi')] - E[U(Q2')] - E[U(Q3')] + E[U(Q4')], (45) 


Appendix E: Proof of Lemma|4] 

To prove the monotonically non-decreasing property of 
S 2 (Qi) with respect to Qi, it is equivalent to show that, if 
M* (Q + ®i) = 2 , then p* (Q) = 2 . This is sufficient to show 

J (Q, 2) - J (Q, 1) < J (Q + ei, 2) - J (Q + ei, 1), (49) 

where Q = (Qi,Q 2 ) and ei = (1,0). 
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By (fT2l) . we have 

J (Q, 2) - J (Q, 1) - J (Q + ei, 2) + J (Q + ei, 1) 
=E[y(Qi')] + g(Q, 2) - E[F(Q2')] - g(Q, 1) 

-E[y(Q3')] - g{Q + ei, 2) + E[V{Q^')] + g{Q + ei, 1) 

^i^E[y(Qi')] - E['1/(Q2')] - E[y(Q3')] + E[1/(Q4')], 

(50) 

where 

Qi' = (min{(3i + Ai, A^i},min{yl2,7V2}), (51a) 

= (min{Ai, A^i},min{(52+ ^2,-/V2}), (51b) 

q 3 = (minIQi+ 2 I 1 + l,iVi},inin{A2, A^2}), (51 c) 

= (min{Ai, Afi},inin{(32+ ^2,iV2}), (51d) 

and (d) is due to 

5(Q; 2) — g{Q, 1) — giQ + ei, 2) + ^(Q + ei, 1) 

= {Qi +Q 2 + Wpp{2) + Wff{2)) - (Qi + Q 2 + Wpp{l) 

+ Wff{l)) - (Qi + <32 + 1 + Wpp{2) + Wff{2)) 

+ [Qi + Q 2 + i + Wpp{l) + iu//(l)) = 0. (52) 

To prove LemmaH] it remains to show that the R.H.S. of (ISOl l 
is nonpositive. By comparing (15lab with (I51cb . we have ^ 
implying that E[F(Q^')] > E[y(Q^')] by Lemma|2] By 
comparing (I51bb with (l5Ta . we have , implying 

that E[F(Q^ )] = E[F(Q^ )]. Thus, by ( l50b . we can show 
that (|4^ holds. 

Similarly, we can show that the following inequality holds: 
J(Q,1)-J(Q,2) < J(Q + e2,l)-J(Q + e2,2), (53) 


with + 7V2 = n > 2. Now consider Z{Ni,N 2 ) with Ni + 
N 2 = n + 1. If Ni = 1, then by ( l54b . we have Z(l,iV2) = 
1 = If ^2 = 1, then by (|55), we have 

Z{m, 1) = 1 = (" 2 ^')- If N^,N 2 > 1, then 

by (l54b and the induction hypothesis, we have Z{Ni,N 2 ) = 
Z(iVi-l,iV2) + Z(iVi,iV2-l) = + = 

Thus, ( l56b holds whenever A^i + At2 = n + 1. 
Therefore, by induction, (l56b holds for any positive integers 
Ni,N 2 . We complete the proof of Proposition [T] 

Appendix G: Proof of Lemma[5] 

We prove Lemma |5] through mathematical induction using 
the RVIA in Appendix B. Denote = {Ql^ f.)meM,keK 
and = {Qm^k)m&M,keic- To prove Lemma |5l by (g^, 
it is equivalent to show that for any Q^,Q^ G Q such that 

> Q\ 

14(Q") > K.(Qi), (57) 

holds for all n = 0, 1, • • •. We initialize Vo(Q) = 0 for all 
Q G Q. Thus, we have Vb(Q^) = Vb(Q^) = 0, i.e., (l57b 
holds for n = 0. Assume that (l57t holds for some n > 0. We 
will prove that (l57t also holds for n + 1. By ( l39b . we have 

14+l(Q^) < Jn+l (Q\p*(Q^)) -mmJn+i(Q^,Un) 

Un 

E[y„(Qi')] + ^ + wppipim, fcf (Q^ <(Q2))) 

m,k 

+ WffifJ.UQ^)) - min J„+i(Q§,u„), (58) 

Un 


where Q = ((3i, < 32 ) and 62 = (0,1). Thus, if p* (Q + 62 ) = 
1, then p* (Q) = 1. This implies the monotonically non¬ 
decreasing property of si{Q 2 ) with respect to (32- We com¬ 
plete the proof of Lemma g] 


Appendix F: Proof of PROPOSiTiONg] 

Let Z{Ni,N 2 ) denote the number of the policies with 
monotonically non-decreasing curves. By Theorem [T] either 
S 2 (( 3 i) or si{Q 2 ) is sufficient to characterize the optimal 
policy. Hence, we have 


N 2 + 1 

E 

ClNi 

E • 

0-2 Oi 

-EEi 

(54) 

O'Ni =0 C 

iNi-l—O 

01=0 oo=0 


ATi + l 

bN2 

&2 bi 


E 

E •• 

•EEi> 

(55) 


=0 biVo —1=0 &i=0 60=0 


where (El and (El are the number of all possible S 2 {Qi) 
and Si (( 32 ), respectively. In the following, we shall show that 


Z{Ni,N2) 


/Ni +N2 + 2\ 

V ^1 + 1 ) 


(56) 


holds for any positive integers Ni, N 2 . 

We use induction on n = A^i -f W 2 > 2. If n = 2, then 
Ni = N 2 = I and we have Z{1,1) = Eai=oEao=ol = 
6 = ( 2 ). Assume (l56b holds for any positive integers , N 2 


where (e) follows from the optimality of /i*(Q^) for 
in the nth iteration, (/) directly follows from ( l38T l, 
= inax|fc|Q^.(p2) k > o| and = 

('3m,fe)mgM.fcex: with ^ min{l(/rj;(Q2) ^ + 

Am,k, Nm,k}- By (1^ and (1^ . we also have 

V;+i(Q") = Jn+I (Q^/i:(Q")) -minJ„+i(Q§,n„) 

Un 

= E[K(Q"')] + E Qm.fe + Wpp{pl{ce), k\ce,pl{Q^))) 

m,k 

+ Wff{p*^{Q'^)) - min J„+i(Q§,m„), (59) 

Tin 

where (Q^)) = max ^q 2) k > n} and 

= {Qt,k)meM,keK with = min{l(p;(Q2) ^ 

™)<3m.fe m,k-) Nm,k}. 

Next, we compare (fSST l and ( E9b term by term. Due 
to > Q^, we have > Q^'. Thus, by the in¬ 
duction hypothesis, we have E[V)i(Q^ )] > E[V)i(Q^ )]. 

Due to Q2 > Qi, we have Em,kQ'L,k > Em.fc Qm.fe 
and fct(Q2,/r*(Q2)) = (Q^,/r*(Q^)), implying that 

Thus, we have Vn+i (Q2) > 14+1 (Qi), i.e., (El holds for 
n -f 1. Therefore, by induction, we can show that (EtI i holds 
for any n. By taking limits on both sides of ( ETb and by ( gOb . 
we complete the proof of Lemma H 
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Appendix H: Proof of Lemma[6] 

By (O, we have 

J(Q,it) — J(Q,ii) — J(Q + E„ fc,u) + J(Q + E„ fejii) 
=E[l/(Qi')] + g(Q, w) - E[E(Q2')] - g(Q, z;) - E[y(Q3')] 
~ '?(Q + u)) + E[E(Q^ )] + 5'(Q + Eu_fe, v) 

=E[E(Qi')] - E[E(Q3')] + E[E(Q4')] - E[E(Q2')] 

+ Wp {p{u, fc^(Q, u)) - p{u, fc*(Q + E„_fc, u))) 

- Wp {p{v, k^{Q,v)) -p{v,k^{Q + Eti,fc,'u))) , (60) 

where Q — (Qm,i)m€:Ar,z£ic und 
j = 1, 2, 3,4 with 

Qm,i — niin'[]L(u ^ “t“ NmA. (61a) 

i = miii{l(z; ^ m)Qm,i + A^m,z}, (61b) 

min{Atj_fc, if m = u,k = i 

T[mi{Qm,i + Ara,i,Nra.i} Otherwise 

(61c) 

qA' _ r Taa\{Qu,k + ^ +Au,k,Nu,k} if m = u,k 

\ Tam{'^A 'm)Qrn,i +Arn,i,Nrn,i} Otherwise 

(61d) 

To prove Lemma |6l it remains to show that the R.H.S. of (l60l l 
is nonnegative. By comparing (I61al) with (I61cl i. we can see 
that Qln i = Qm,i for all m,i, i.e., • Thus, we have 

E[E(Q^ )] = E[E(Q^ )]. By comparing (I61bb with ( I61db . we 
can see that > Ql ,^ > 0 and ^ , 

for all m 7^ M, i 7^ k, i.e., . Thus, by Lemma |5] 

we have E[E(Q^ )] > E[E(Q^ )]. In addition, due to Q + 
E„,fc > Q, we have k < max{fc|(5u_fe > 0}. Then, we have 
fcl(Q + E„_fc,u) = fcl(Q,u) and fcl(Q + E„_fc,u) = fcl(Q,'!;), 
implying that p(u, fcl(Q,u)) = p{u,A{Q, + ^u,k,u)) and 
p(m,/ cl(Q,u)) = p{u,A{Q + ^u,k,v))- Therefore, by ( l60b , 
we have J(Q + E„_/c,u) - J(Q + E„,fe,u) < J(Q,w) - 
J(Q,u). We complete the proof of Lemma |6] 

Appendix I: Proof of Theorem|2] 

Consider content u € A4, user k G 1C and state Q = 
where Qu,k = Su,k{Q-u,-k)- Note that, 
if Su,k{Q-u,-k) = oo, (fTsT l always holds. Therefore, in 
the following, we only consider that Su,k{Q-u,-k) < oo. 
According to the definition of Su,k{Q-u,-k) in Theorem|2] we 
can see that J(Q, u) < J(Q, v) for all v € Ai,v ^ u. Thus, it 
is optimal to multicast content u for state Q, /i*(Q) = u. Con¬ 
sider another state Q' = {Q'^Am&M,ieic where ^ > Qy,^k 
and Q'^ ^ = Qm,i for all (m, i) 7^ (u, k). To prove Theorem|2] 
it is equivalent to show that it is also optimal to multicast 
content u for state Q', i.e., 

J(Q',u) < J(Q',t;),Vw G AdjU 7^ u. (62) 

According to the relationship between k and fcl(/c,Q„) as 
well as the value of Su,k{Q-u,-k), we have the following 
three cases. 

(1) If k < /cf(fc,Q„), i.e., condition (a) holds, we have 
k < max{fc|(5u,fe > 0}. By Lemma |6] for any v C Ad and 


u 7^: M, we have 

J(Q', u) - J(Q', v) < JiQ, u) - J(Q, v) < 0. (63) 

Thus, it is optimal to multicast content u for state Q'. 

(2) If fc > fcf(fc,Q„) and Su,k{Q-u,-k) > 0, i.e. condition 
(b) holds, we have k = max{fc|(5ii ^ > 0}. By Lemma |6] for 
any v G Ai and v ^ u, (l63b also holds. Thus, it is optimal to 
multicast content u for state Q'. 

(3) If k> A{k, Qu) and Su,k{Q-u,-k) = 0, implying that 
k > max{/c|(5„_fc >0}, then Lemma |6] does not apply and it is 
unknown whether ( l63T l holds. Therefore, it is unclear whether 
it is optimal to multicast content u for state Q'. 

We complete the proof of Theorem |2] 

Appendix J: Proof of Lemma[7] 

Following the proof in lIZTl . we shall prove the ad¬ 
ditive property w.r.t. the value function. First, we have 
5 (Q,m) = J2m(^M9m,iQm,u). Second, by the relationship 
between the joint distribution and the marginal distribution, 
we have EqgQP^IQ'IQa] = = 

^Q, gg^ Pr[QmlQmi ^]- Therefore, by substituting 6 = 

J2meM and nq) =Y.m^M^rn (Qm) into (l2Tb . we can 
see that the equality holds, which completes the proof. 

Appendix K; Proof of Theorem|4] 

We first prove the structural property of /i* for the uniform 
case. First, we show that for all m G Ad, the per-content value 
function VmiQm) satisfies 

VmiQA) > VmiQl), (64) 

for any QA G Qm such that >Qln- For each Qm G 
Qm, in the n-th iteration, RVIA updates according 

to 

E”+'(Q™)=E4[p^(Q„,zr)] 

+ ^ E4 [Pr[Q:„|Q„, u]] VmiQ'm) - (65) 

Q'm^Qm 

where G Qm is some fixed state. Following the proof 
of Lemma |2] we can show that for any Qm, Qm G Qm 
such that > QA we have V^iQl,) > VmiQL) for all 
n = 0,1, • • •. Therefore, we can show that (l64b holds through 
mathematical induction using RVIA. Then, following the proof 
of Lemma [3l we can show that for any u,v G AA and u ^ v, 
j(Q, u) — j(Q, v) is monotonically non-increasing with Qu, 
i.e., 

J{Q + e„, m) - J(Q -I- e^, v) < J(Q, u) - J(Q, v). (66) 

Finally, following the proof of Theorem [T] we can show that 
/t* has a switch structure in the uniform case. We complete 
the proof of Part 1) in Theorem |4] 

Next, we prove the structural property of ji* for the nonuni¬ 
form case. The procedure is similar to that for the uniform 
case. First, similar to the proofs for (l64b and Lemma |5] we 
can show that for all m G A4, the per-content value function 
VmiQm) satisfies 

VmiQl) > VmiQl) 


(67) 
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for any Q^, e Qm such that > Q^. Then, following 
the proof of Lemma |6] we can show that for any u,v € Ai 
and u ^ V, k < max{fc|(5ii ^ > 0}, 

J(Q + E„_fc,u)-J(Q + E„,fe,t;) < J(Q,u)-J(Q,u). (68) 

Finally, following the proof of Theorem |2] we can show that 
/t* has a partial switch structure in the nonuniform case. We 
complete the proof of Part 2) in Theorem |4] 
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